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ABSTRACT Prior to the epidemic that emerged in Haiti in October of 2010, cholera had not been documented in this country. Af- 
ter its introduction, a strain of Vibrio cholerae Ol spread rapidly throughout Haiti, where it caused over 600,000 cases of disease 
and > 7,500 deaths in the first two years of the epidemic. We appUed whole-genome sequencing to a temporal series of V. chol- 
erae isolates from Haiti to gain insight into the mode and tempo of evolution in this isolated population of V. cholerae Ol. Phy- 
logenetic and Bayesian analyses supported the hypothesis that all isolates in the sample set diverged from a common ancestor 
within a time frame that is consistent with epidemiological observations. A pangenome analysis showed nearly homogeneous 
genomic content, with no evidence of gene acquisition among Haiti isolates. Nine nearly closed genomes assembled from 
conttnuous-long-read data showed evidence of genome rearrangements and supported the observation of no gene acquisition 
among isolates. Thus, intrinsic mutational processes can account for virtually all of the observed genetic polymorphism, with no 
demonstrable contribution from horizontal gene transfer (HGT). Consistent vsdth this, the 12 Haiti isolates tested by laboratory 
HGT assays were severely impaired for transformation, although unlike previously characterized noncompetent V. cholerae iso- 
lates, each expressed hapR and possessed a functional quorum-sensing system. Continued monitoring of V. cholerae in Haiti 
will illuminate the processes influencing the origin and fate of genome variants, which wiU facilitate interpretation of genetic 
variation in future epidemics. 

IMPORTANCE Vibrio cholerae is the cause of substantial morbidity and mortaUty worldwide, with over three miUion cases of dis- 
ease each year. An understanding of the mode and rate of evolutionary change is critical for proper interpretation of genome 
sequence data and attribution of outbreak sources. The Haiti epidemic provides an unprecedented opportunity to study an iso- 
lated, single-source outbreak of Vibrio cholerae Ol over an established time frame. By using multiple approaches to assay ge- 
netic variation, we found no evidence that the Haiti strain has acquired any genes by horizontal gene transfer, an observation 
that led us to discover that it is also poorly transformable. We have found no evidence that environmental strains have played a 
role in the evolution of the outbreak strain. 
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I lihrio cholerae is a pathogen of considerable public health con- 
If cern because of its potential to cause large epidemics and pan- 
demics and its high case fatality rate when the disease is left un- 
treated. The disease cholera is caused by Y. cholerae strains of 
serogroups Ol and 0139 that can produce a potent enterotoxin, 
cholera toxin, which is encoded by the cfxAB genes on the bacte- 
riophage CTX(/) (1). Seven pandemics of cholera have been re- 
corded since 1817, when the disease first emerged from the Bay of 
Bengal and spread around the globe (2). The current seventh pan- 
demic of V. cholerae originated in Southeast Asia and has spread 
across the globe in several waves of transmission (3) . In October of 



2010, cholera made its appearance in Haiti. Prior to 2010, there 
were no documented cases of cholera in that country, despite the 
devastating outbreaks occurring in the Caribbean in the 19th cen- 
tury (4). Its introduction to the island of Hispaniola following the 
earthquake that occurred there in January of 2010 has resulted in 
the largest epidemic of cholera in recent times: 604,634 cases and 
7,436 deaths were documented in the first two years of the epi- 
demic (5). 

Initial epidemiological and genetic studies focused on the ori- 
gin of the Haiti epidemic and quickly attributed the outbreak to 
human introduction of a V. cholerae Ol strain from outside the 
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region, most likely South Asia (6, 7). Epidemiological investiga- 
tions pointed to Nepalese troops serving as United Nations (UN) 
peacekeepers as the source of cholera, based on reports of unsan- 
itary conditions at the UN camp, the spatial-temporal pattern of 
disease clusters, and the coincidence of the outbreak with the ar- 
rival of the UN troops from Nepal (8). Phylogenetic analysis of 
time-relevant isolates from Haiti and Nepal provided additional 
support for the hypothesis that the epidemic strain was imported 
from Nepal (9). 

The single-source introduction and geographic isolation of the 
Haiti epidemic, along with the extended duration of the outbreak, 
provide an unprecedented natural experiment for characterizing 
in detail the intrinsic tempo and mode of genome evolution in this 
deadly pathogen. We performed whole-genome sequencing on a 
set of well-characterized isolates collected near or after the 1-year 
anniversary date of the Haitian outbreak, and compared them 
with isolates collected early in the epidemic to gain insight into the 
dynamics of genome evolution in V. choleme Ol. The sample set 
includes isolates collected at different time points and in different 
localities (9, 10) as well as phenotypically and genotypically dis- 
tinct isolates discovered during routine laboratory surveillance by 
the Centers for Disease Control and Prevention. The variants that 
have arisen in the course of the outbreak include various pulsed- 
field gel electrophoresis (PFGE) pattern combinations, serotype 
Inaba (11), an altered antibiotic susceptibility pattern (ASP), and 
a nonagglutinating (NAG) V. cholerae strain. We first conducted 
phylogenetic analysis to determine whether the diverse set of iso- 
lates were all part of the same outbreak and then used the genome 
sequences to compare gene content and structural arrangement of 
chromosomes. 

RESULTS 

We sequenced 23 genomes on the lUumina platform (see Table SI 
in the supplemental material). The sample set represents geo- 
graphically dispersed isolates collected over an array of time 
points and representing multiple PFGE pattern combinations (see 
Fig. SI to S3 in the supplemental material). Eighty-seven genomes 
were downloaded from the Sequence Read Archive (SRA) (see 
Table S2 in the supplemental material); two were found to have 
>20% non-Vibrio genetic material and were excluded from the 
study: hc-17al and hc-77al. Comparing the 108 genomes yielded 
566 core genome single-nucleotide polymorphisms (SNPs). Of 
the 23 isolates, we sequenced 9 on the PacBio platform and rese- 
quenced the reference strain 2010EL-1786. 

A phylogenetic tree constructed from the 566 core SNPs 
grouped all Haiti isolates and three Nepal isolates (14, 25, 26) in a 
single monophyletic group within the context of a global collec- 
tion of 108 Vibrio cholerae Ol strains (see Fig. S4 in the supple- 
mental material). Next, we uncovered 45 high-quality SNPs (hqS- 
NPs) in the Haiti-Nepal group. The minimum spanning tree 
(MST) constructed from the hqSNPs was concordant with the 
clustering of isolates by maximum likelihood analysis (Fig. 1 ; also, 
see Fig. S4 in the supplemental material). The MST illustrated the 
radiation of numerous lineages from a single sequence type that 
predominated in the early part of the epidemic. 

We then examined the 45 hqSNPs for potential effects on func- 
tion (see Table S3 in the supplemental material). Most notable was 
a GAA-to-TAA substitution in the wbeT gene of 2012EL-1410, a 
representative of five serotype Inaba isolates from our Haiti col- 
lection. The substitution introduces a premature stop codon into 



the gene, which predicts a truncated protein, a result that is con- 
sistent with other studies showing that serotype conversion results 
from mutations in the wbeT gene (12). Comparison of three dif- 
ferent molecular clock models showed that overall the changes at 
nucleotide sites were consistent with the epidemic behavior, as the 
highest likelihood was obtained under the exponential growth 
model. Using a strict molecular clock, analysis of 10^ states from 
eight independent runs yielded a median estimate of the date of 
the most recent common ancestor to be 28 September 2010 (95% 
credible interval, 23 July 2010 to 17 October 2010). 

Variation in gene content and structural arrangement. Few 
differences in gene content were observed (Fig. 2). The BLAST 
atlas showed no evidence of gene acquisition, but a few deletions 
were apparent, and the assembly of long reads showed similar 
results. In addition, three large inversions in or around the SXT 
region were evident in the long-read assemblies (Fig. 3). Amplifi- 
cation across the 3' end of the inversion boundaries in isolates 
with rearrangements and in the reference strain 2010EL-1786 
confirmed the structural variation observed in the assemblies. 

Quorum sensing and transformation. To determine whether 
the Haiti clone was capable of natural transformation, one mech- 
anism of horizontal gene transfer (HGT), standard laboratory 
DNA uptake assays (18) were performed on twelve isolates. Vir- 
tually no kanamycin-resistant (Kan"^), Lac~ transformants were 
detected (Table 1) for Haiti isolates in assays using DNA from 
reference V. cholerae strain C6706 with a kan gene disrupting the 
lacZ gene (14). C6706, which is capable of quorum sensing (QS), 
transformed at an efficiency 10' to lO"* times greater than that of 
each of the Haiti isolates and a QS-deficient C6706 A.hapR strain. 
Similar results were observed when we attempted to transform 
each Haiti isolate with its own kan genomic DNA (Table 2), or 
with C6706 genomic DNA with an ampicillin resistance gene dis- 
rupting the lacZ gene (data not shown). Thus, it appears that the 
Haiti clone not only failed to acquire any genes by HGT but also 
was poorly transformable by standard laboratory DNA uptake 
assays. We examined the sequences of 47 genes that are involved in 
the QS and other signaling systems known to control transforma- 
tion (14); each gene was present, and there were no loss-of- 
function or nonsense mutations in these genes or their promoters 
(data not shown). Also, each Haiti isolate was experimentally 
shown to be QS proficient, as expression of a QS-dependent re- 
porter gene introduced into each of the 1 2 Haiti strains was similar 
to that of the positive control (PC) C6706 strain, while the nega- 
tive control (NC) C6706 AhapR strain was > 1,000-fold impaired, 
which corresponds to the detection limit (data not shown). 

DISCUSSION 

We used genomic approaches to characterize evolutionary 
changes in the V. cholerae Ol population following the introduc- 
tion to Haiti. Our results were consistent with previous findings 
that show that the Haiti cholera outbreak is clonal and that Nep- 
alese isolates are the closest relatives to the Haiti strain identified 
to date (9), even when placed in a phylogeny with a larger collec- 
tion of isolates representing recent cholera epidemics. A previous 
study based on WGS (9) showed that Nepalese isolates were al- 
most indistinguishable from Haiti isolates; however, that phylog- 
eny did not include isolates recovered from recent cholera out- 
breaks. The phylogeny presented here provides evidence that the 
observed variants that were detected by our surveillance are part of 
the same outbreak and not representatives of secondary introduc- 
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FIG 1 Minimum spanning tree (MST) constructed from 45 hqSNPs. Intermediate hypotitetical ancestors were inferred for some lineages and manually placed 
in the network. Nucleotide substitutions are indicated by dashes along the branches, and deletion events are indicated by arrows. ASP, antimicrobial susceptibility 
pattern. The -10 kb deletion in the SXT is inferred twice in the model, and occurs between two transposase genes, VCH1786_I0078 and VCH1786_I0087. *, 
nonsynonymous transversion. 
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FIG 2 BLAST atlas. Genes from each de novo assembly were compared against both chromosomes of the completed genome 2010EL-1786. Absence of color 
(white) indicates that a strain is missing genetic material that is present in the reference. Regions of interest are denoted by black rectangles on the circumference. 
Three genomes have a shorter lUumina read length (36 bp), which might have resulted in fewer gene predictions, which manifests as an artifact on the atlas in the 
superintegron region. The isolate names are listed from the outermost to the innermost tracks. 
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FIG 3 Circos plot showing structural genome rearrangements. Each alternating white/ gray band represents one of the nine strains sequenced on the PacBio RS 
platform, with an inner tenth track representing the pubHshed reference strain { 7) . Each band contains three tracks. The outermost, orange tiles represent contigs 
from de novo assembhes ahgned back to the 2010EL-1786 reference. Inversions within contigs are highlighted in blue. Black line plots show local alignment 
quality over Ikb windows, in terms of Phred QV, for long read assembled contigs compared to the 2010EL-1786. Note the smallest dips in QV correspond to 
single SNPs. Green line plots show local coverage of lllumina short reads mapped to the reference over 500bp windows. The innermost band contains a purple 
track showing contigs from a separate long read de novo assembly of 2010EL-1786. The black QV track highUghts differences between the long read and short 
assembly, indicating potential areas of misassembly in the original reference. 
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TABLE 1 Haiti isolates 


are defective for transformation with C6706 : 


gDNA witli kan at lacZ 








Recipient strain 


Tea 


Range 


roid reduction'" 


C6706hcZikan) 


C6706 


<1.3E-5 


1.9E-5-7.8E-6 


1 


WQ6lacZ(kan) 


2010EL-1786 


<3.8E-9 


- 


>3,428 


C6706lacZ{kan) 


2010EL-1799 


<2.5E-9 


- 


>5,200 


C6706/flcZ(fcfl«) 


2010EL-2026 


<1.9E-9 


- 


>6,948 


C6706lacZ{kan) 


2011EL-1818 


<1.6E-9 


- 


>8,196 


C6706lacZ{kan) 


2011EL-1841 


<1.5E-9 


- 


>8,392 


C6706lacZ{kan) 


2011EL-2319 


<5.4E-9 


- 


>2,399 


/^^TM^J ''7/1 \ 

L6706lacZ(kan) 


2011EL-2320 


<1.3E-9 




>9,588 


C6706lacZ{kan) 


2011EL-2321 


<1.4E-9 




>9,508 


C6706lacZ{kan) 


2011EL-2322 


<3.6E-9 


<DL-8.6E-9 


>3,621 


C67Q6lacZ(kan) 


2011EL-2323 


<1.8E-9 




>7,223 


C67Q6lacZ(kan) 


2012EL-1410 


<7.3E-10 




> 17,648 


C6706hcZ(kan) 


2011EL-1300 


<9.5E-10 




>13,635 


C6706hcZ{kan) 


C6706 \hapR 


<6.5E-9 




>2,001 



" TF, average transformation frequency from triplicate samples. 

^ DL (detection limit) was < 1 .OE-8 for all experiments. "-" indicates transformants below the DL. 1 CFtJ rather than 0 was used to calculate TF. 
Fold reduction = TF of recipient/TF of C6706. 



tions. The synthesis of PFGE and sequence data demonstrated the 
utihty of WGS in establishing the clonahty of isolates that exhibit 
greater PFGE dissimilarity than would normally be attributed to a 
single-source outbreak. The use of high-resolution sequence data 
that are amenable to evolutionary analysis will greatly enhance our 
ability to discern transmission pathways of virulent clones, such as 
the one implicated in this epidemic. 

The nucleotide polymorphism that we detected was consistent 
with the observed epidemic behavior, as the most supported 
model of population growth was exponential. The molecular 
clock calculated with this model estimated a most recent common 
ancestor date of 28 September 2010 (95% credibility interval [CI], 
23 July 2010 to 17 October 2010) (see Fig. S5 in the supplemental 
material). The credibility interval encompasses the date that the 
Nepalese soldiers arrived in Haiti (9 October 2010) (8), as well as 
the first reported hospitalization of a cholera case (17 October 
2010) (although an earlier fatal case with an onset date of 12 Oc- 
tober may have been the index case) (15). The consistency be- 
tween the molecular data and the epidemiological information 
demonstrates the utility of advanced statistical tools in outbreak 
investigations where epidemiological information maybe lacking. 
Our results suggest that a population genomic approach can be 
very powerful in delimiting the time frame of an outbreak. 

We observed remarkably few differences in the genetic reper- 
toire of the V. cholerae Ol population (Fig. 2). All genes from Haiti 
isolates were found in the genome of the reference strain (20 lOEL- 
1786), and the differences in gene content could be attributed to 
loss of genetic material. The NAG isolate (2012V-1060) had a 
unique ~ 1 0-kb deletion, and closer examination revealed that sev- 
eral key components were missing from the rfb region (see Fig. S6 
in the supplemental material). The isolate therefore appeared to 
be a serogroup 0 1 strain that was unable to synthesize or transport 
O antigen to the cell surface. Five representative strains were ob- 
served to have large deletions in the SXT, an ~100-kb integrative 
conjugative element that exhibits considerable diversity in gene 
content (16). One ~ 10-kb SXT deletion was found in three isolates 
(Fig. 1; also, see Fig. S7 in the supplemental material) that also 
shared an altered ASP. The typical ASP for Haiti V. cholerae in- 
cludes intermediate resistance to chloramphenicol and nonsus- 
ceptibility to streptomycin, sulfisoxazole, trimethoprim/sulfame- 



thoxazole, and nalidixic acid, resistance traits that are encoded by 
floR, strA, strB, sul2, dfrAl, and a point mutation in gyrA (17). The 
isolates with the altered ASP displayed nonsusceptibility only to 
nalidixic acid, and both PGR and genome analysis confirmed the 
loss offloR, StrA, strB, and sul2 resistance genes in the SXT region 
of these isolates. The deletion could not be placed parsimoniously 
on the MST (Fig. 1 ); however, it falls within variable region III ( 1 6) 
(see Fig. S7) and is flanked by transposase genes, so it is reasonable 
to suppose that the same deletion occurred independently in two 
different lineages. The SXT deletion could be made parsimonious 
by inferring that the nonsynonymous transversion (Fig. 1, aster- 
isk) reverted to its original state, an event that we concluded is less 
likely to occur. We note that a parsimonious reconstruction indi- 
cates loss of SXT genes in the three closest Nepal isolates after 
divergence from the common ancestor of the Nepal-Haiti cluster 
(Fig. 1), as the other two most closely related Nepal lineages from 
Hendriksen et al. (9) (Nepal-2 and Nepal-3) possess an intact SXT 
(see Fig. S7). The large deletions in the SXT and rfb regions (Fig. 3) 
were confirmed by the continuous-long-read (CLR) assemblies. 
The nearly complete assemblies from the GLR and the BLAST 
atlas were both consistent with the notion that the Haiti V. chol- 
erae strain has not acquired genes or genomic islands. Thus, we 
found no evidence that unrelated bacterial strains in the environ- 
ment have contributed to the diversification of the Haiti outbreak 
strain. 

It is well accepted that HGT is a major force driving evolution 
in bacteria, including Vibrio (18, 19); thus, the lack of HGT ob- 
served in our study might be surprising. Limited data suggest that 
changes can accumulate over a relatively short time frame (20), 
and a previous study of V. cholerae from Haiti (10) reported accu- 
mulation of diversity early in the epidemic although gene acqui- 
sition was not specifically demonstrated in V. cholerae serogroup 
Ol. In Vibrio cholerae, natural transformation, an important 
mechanism of HGT, occurs on chitinous surfaces and requires 
quorum sensing (QS) (i.e., the HapR QS transcription factor) (13, 
14). Thus, we first considered whether transmission dynamics 
precluded the establishment of epidemic V. cholerae in the envi- 
ronment so that they lacked exposure to other Vibrio and to chitin, 
which are critical for HGT by transformation (12). Although it is 
possible that environmental factors could limit HGT between en- 



6 mBio' mbio.asm.org 



July/August 2013 Volume 4 Issue 4 e00398-13 



Evolutionary Dynamics of V. cholerae 



TABLE 2 Haiti isolates are defective for transformation with self-derived tn{kan) gDNA" 



gDNA donor 


Test sample 






Control 






Fold reduction*" 


Recipient strain 


TF 


Range 


Recipient strain 


TF 


Range 


2010EL-1786 


2010EL-1786 


<5.1E-9 




C6706 


5.3E-6 


4.7E-6-5.7E-6 


> 1,046 


2010EL-1799 


2010EL-1799 


<2.1E-9 




C6706 


1.8E-5 


1.1E-5-1.9E-5 


>8,462 


2010EL-2026 


2010EL-2026 


<1.7E-9 




C6706 


6.0E-6 


1.8E-6-1.0E-6 


>3,484 


2011EL-1818 


2011EL-1818 


<2.1E-9 


<DL-2.5E-9 


C6706 


1.4E-5 


8.8E-6-1.9E-5 


> 6,842 


2011EL-1841 


2011EL-1841 


<8.1E-9 


<DL-1.3E-8 


C6706 


1.9E-5 


1.2E-5-2.5E-5 


>2,314 


2011EL-2319 


2011EL-2319 


<2.0E-9 




C6706 


1.4E-5 


5.6E-6-1.7E-5 


>7,087 


2011EL-2320 


2011EL-2320 


<1.9E-9 




C6706 


1.7E-5 


1.1E-5-1.8E-5 


>8,826 


2011EL-2321 


2011EL-2321 


<5.4E-9 


<DL-9.9E-9 


C6706 


1.9E-5 


9.3E-6-6.2E-5 


>3,437 


2011EL-2322 


2011EL-2322 


<5.7E-9 


<DL-3.3E-9 


C6706 


l.OE-5 


1.1E-6-2.3E-5 


>1,789 


2011EL-2323 


2011EL-2323 


<3.7E-9 


<DL-1.9E-9 


C6706 


l.lE-5 


4.7E-6-1.1E-5 


>2,966 


2012EL-1410 


2012EL-1410 


<4.2E-9 




C6706 


4.8E-5 


3.7E-6-5.6E-6 


>1,161 


2011EL-1300 


2011EL-1300 


<1.6E-9 




C6706 


9.4E-6 


5.7E-6-1.4E-5 


>5,960 


2011EL-1300 


C6706 ^hapR' 


<6.4E-9 




C6706 


9.4E-6 


5.7E-6-1.4E-5 


>1,476 



" TF, DL, and - are as defined in Table 1. 
^ Fold reduction = TF(test)/TF(control). 

"The C6706 LhapR recipient was transformed with gDNA from 2011EL-1300. 



vironmental and epidemic V. cholerae, they are likely in contact, as 
coisolation has been observed on several occasions from environ- 
mental and clinical samples (10) (Vince Hill [CDC], personal 
communication). We therefore tested the alternative hypothesis 
that the Haiti V. cholerae strain could be deficient in natural trans- 
formation and therefore unable to take up and incorporate for- 
eign DNA. The transformation experiments on chitin showed that 
unlike naturally competent clinical strain C6706, each Haiti iso- 
late was severely impaired in its ability to take up extracellular 
DNA derived from C6706 (Table 1) or from itself (Table 2). Fur- 
ther experiments confirmed that the isolates, like C6706, were 
quorum-sensing proficient and expressed hapR (data not shown). 
Thus, the Haiti strain appears to be limited in its ability to acquire 
new genetic material through transformation, but this is not due 
to QS deficiencies, as have been identified in other nontransform- 
able v. cholerae isolates (13) (B. K. Hammer and E. E. Bernardy, 
unpublished results). It remains possible that V. cholerae isolates 
defective for transformation in lab settings may, nonetheless, be 
naturally competent in nature. However, to our knowledge no 
such strains have yet been described. Also, a longer time frame 
may be required for recombination to occur with more distantly 
related lineages and create mosaics that are successful enough to 
be observed. We note that the low rates of transformation would 
presumably not affect HGT via other mechanisms such as phage 
transduction or conjugation. 

In summary, the Haiti cholera epidemic provided a unique 
opportunity to study the evolutionary dynamics of an isolated 
population of V. cholerae Ol. Although PFGE was initially critical 
for determining that a single strain caused the outbreak, subse- 
quent changes in PFGE patterns precluded our ability to deter- 
mine that observed variants traced back to a single epidemic 
founder, an issue that was addressed by using WGS in our com- 
prehensive surveillance strategy. The sample set was virtually ho- 
mogeneous in gene content, an observation that led to the discov- 
ery that the Haiti strain is poorly transformable. Thus, our study 
indicates that transformation by unrelated environmental strains 
of v. cholerae has played no detectable role in the evolution of the 
outbreak strain. Further studies will define the genetic muta- 
tion(s) that rendered the Haiti strain defective in HGT via natural 
transformation. Once the mutation(s) is identified, its temporal 



origin and global prevalence can be determined to understand 
more about the stability and success of this particular genotype 
and the role that transformation plays in its genome evolution and 
success in establishing itself in a new environment. Atypical Ol El 
Tor V. cholerae strains such as the Haiti strain have already dis- 
placed prototypical El Tor strains and emerged as the predomi- 
nant clone circulating in Asia and Africa (21-24). These strains 
have acquired multidrug resistance and enhanced virulence traits 
such as classical or hybrid CTX prophage and SXT-ICE, resulting 
in higher infection rates and harsher symptoms (25). With the 
tools such as WGS now being available for epidemiological sur- 
veillance and case tracking, we argue for renewed efforts aimed at 
cholera prevention to avert more widespread and difficult-to- 
treat cholera outbreaks. 

MATERIALS AND METHODS 

Bacterial isolates. A total of 23 bacterial isolates were chosen for whole- 
genome sequencing based on phenotypic and genetic diversity that was 
observed during routine molecular surveillance as previously described 
(26, 17) (see Table SI in the supplemental material). Clinical isolate 
C6706, the isogenic \hapR mutant, and the C6706 derivative carrying kan 
at the lacZ site (27) were from our strain collection. 

Genome sequence determination. Single-molecule, real-time 
(SMRT) sequencing was performed on the PacBio RS platform using 
SMRTbeU libraries targeting 10-kb inserts using 90-min movies and C2 
chemistry (Pacific Biosciences, Menio Park, CA) using previously estab- 
lished methods and commercially available chemistries (28). Single-end 
70- and 100-bp lUumina reads were generated on the GAllx platform 
(lUumina, San Diego, CA) using standard procedures. 

Genome data acquisition and initial data processing. lUumina runs 
of publicaUy available genomes were retrieved from the Sequence Read 
Archive (SRA) and the closed genome of 2010EL-1786 from GenBank 
(accession numbers NC_016445.1 andNC_016446.1). 2010EL-1786 is an 
early outbreak isolate from Haiti, which we sequenced, closed, and anno- 
tated ( 7 ) . To improve assembly quality, we trimmed poor quality from the 
ends of reads and removed reads of bad quality using the script run 
_assembly_trimClean.pl from CG-Pipeline (CGP) with default options or 
with a minimum length of 30 bp and no trimming for the 36-bp read sets 
(29). De novo assembly was performed on resulting reads using CGP, 
which assembles with Velvet (30). Optimal parameters were determined 
for each assembly using VelvetOptimiser with a fc-mer range from 27 to 
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63 bp. Coding gene predictions were prepared with CG-Pipeline, which 
uses a comprehensive gene prediction approach (29). 

Genomes of >4.2 Mb were analyzed for potential contamination by 
comparing the contigs against the RefSeq database of microbial genomes 
using BLASTn. 

Variant calling and annotation. lUumina reads were mapped against 
2010EL-1786 using the SMALT mapper (31), and variants were called 
with FreeBayes (32). Analysis parameters for calling high-quality SNPs 
(hqSNPs) were optimized by manually reviewing the pUeups versus vari- 
ant calls for seven random genomes analyzed under different conditions. 
The following parameters were used: smalt index -k 13 -s 6; smalt map -f 
samsoft; freebayes --pvar 0 — ploidy 1 — left-align-indels — min-mapping- 
quality 0 — min-base-quality 20 — min-alternate-fraction 0.75. We re- 
moved indels and SNPs with a depth of coverage less than 10 from variant 
calls. The set of SNPs that passed our filters comprise the final set of 
hqSNPs used in all subsequent analysis. The set of 45 hqSNPs within the 
core genome of the Haitian and the three closely related Nepalese ge- 
nomes were annotated with snpEff (33). 

Core genome phylogeny. The Haiti isolates were first examined in a 
broad phylogenetic context by constructing a tree containing 108 ge- 
nomes with PhyML using the K80 substitution model, best of NNI and 
SPR for tree topology searching, and SH-like branch supports (34) (see 
Fig. S4 in the supplemental material). A phylogeny was constructed for all 
genomes clustering with the original Haiti genomes (7, 10) plus the three 
related Nepal genomes (9) using the same approach (« = 63). 

Bayesian analysis. The analysis of the date for the most recent com- 
mon ancestor (MRCA) was based on the alignment of 45 hqSNPs from 32 
Haiti genomes with known DOCs (see Tables SI and S2 in the supplemen- 
tal material). To estimate the date of the outbreak, the sequences were 
analyzed using BEAST1.72 with an HKY model (35) and 10* iterations of 
the Markov Chain Monte Carlo Simulation. To minimize the number of 
parameters estimated, we used a strict molecular clock with a starting 
estimate of 3E-4 hqSNPs/site/day, as estimated by Path-o-Gen (36). We 
tested three different growth models: constant population size, expansion, 
and exponential and compared models using Bayes factors based on mar- 
ginal likelihoods sampled from the posterior. TreeAnnotator, with a 
burn-in of 1,000 trees, was used to find the best-fitting tree with default 
parameters. 

BLAST atlas. The BLAST atlases were constructed using lUumina- 
only assemblies relative to a pan-genome, initiated by a concatenated and 
closed assembly of 2010EL-1786 (accession numbers NC_016445.1 and 
NC_016446.1). The pangenome for the BLAST atlas was constructed by 
iterative comparisons of gene predictions from each of the assemblies 
using BLASTn against the genes in the pangenome set (37). Any genes 
among the other assemblies which did not have a hit to the pangenome 
with >80% identity and > 100 nucleotides were added to the pangenome 
set. 

Detecting rearrangements with PacBio assemblies. Ten Haiti iso- 
lates, including the reference strain 2010EL-1786, were sequenced on the 
PacBio RS platform using standard C2 chemistry and protocols. For each 
strain, reference-based alignments as well as de novo assemblies were com- 
pared to the closed reference genome of Haiti isolate 2010EL-1786 (7). 
The reference-based approach to detect structural variants was performed 
as described previously (28). A novel assembly method was employed that 
enabled high-quality de novo assembly using solely continuous-long-read 
(CLR) sequencing data from the PacBio RS platform. The assembly pipe- 
line has three main components: preassembly, assembly, and assembly 
polishing (38). 

The preassembly step utilizes the error correction framework from the 
AHA pipeline (39). However, rather than correcting the long reads with 
high-quality short reads, as previously described, long reads are used to 
generate a high-accuracy consensus of other long reads (28). Specifically, 
the full CLR data set was divided into subsets to include the longest CLRs. 
The length cutoff for each strain was set to obtain at least 10 X corrected 
read coverage; depending on the sequencing depth and read length pro- 



files, these size cutoffs ranged from 4 to 6 kb. This read subset was then 
corrected using the complete CLR set for each strain. 

The resulting corrected (or "preassembled") reads were trimmed to 
eliminate any low-quality regions in the consensus for each read. The 
resultant trimmed long reads were then size selected again to obtain at 
least 10 X long-read coverage of the genome (cutoffsrangedfrom3 to 6kb 
for the preassembled reads). These high-quality reads were passed directly 
into the Celera Assembler (40). 

We performed a final finishing step using Quiver from Pacific Bio- 
sciences (https://github.com/PacificBiosciences/GenomicConsensus), 
which leverages specific quality values and features unique to single mol- 
ecule sequencing. For each strain, aU raw reads were aligned back to the 
de novo assembled output from the Celera Assembler using BLASR (41). 
Consensus calling was performed using Quiver to obtain the final finished 
assemblies. 

Sequencing for the superintegron. To validate the PacBio assemblies, 
we employed Sanger sequencing of the integron region of one isolate. 
Fosmid libraries were constructed from genomic DNA fragments of 
2011EL-2320 using the PCC2FQS Vector (Epicentre, Madison, WI). Li- 
braries were screened using attC-targeted primers (reference PMID 
17464063) to find integron-containing fragments. Transposons were in- 
troduced into the selected fosmids with the EZ-Tn5 <KAN-2> insertion 
kit (Epicentre) and sequenced using EZ-Tn5-specific primers. Geneious 
v6.0.3 (Biomatters Ltd., Auckland, New Zealand) was used to assemble 
sequences to generate the full-length integron sequence. The Sanger con- 
sensus sequence was compared to the PacBio assembly. 

Confirmation of genome rearrangements. We used PGR to confirm 
the inversions observed in the CLR assemblies for strains 2011EL-2319, 
2011EL-2320, and 2011EL-2323. The sequences around the inversion 
breakpoints were compared to that of reference strain 2010EL-1786, and 
nucleotide alignments were viewed in MEGA5 (42). The primer se- 
quences (5' to 3') are as follows: 2011EL-2319, GCATTATATC- 
CCGTCGTTTA and GATCAACTAGCTGGAATAAA; 2011EL-2320, 
GCATTATATCCCGTCGTTTA and TCTGTCAATGAATACGCAGA; 
and 2011EL-2323, AGTTTATGATTATGAATAGTGA and GAAACACT 
AATACAACTAGCC. 

Chitin-induced natural transformation assay for HGT. As described 
previously (13, 14), sterile crab shells in triplicate wells were inoculated 
with each V. cholerae strain in a 12-well plate and provided with 2 fxg of 
genomic DNA (gDNA) marked with a kanamycin resistance (kan) gene. 
Following a 24-h incubation, attached cells were harvested and plated to 
quantify transformation frequency (TF), defined as kan CFU ml^Vtotal 
CFU ml^'. Experiments were performed in triplicate. For the experi- 
ments whose results are shown in Table 1 , each strain was provided with 
donor gDNA from a C6706 derivative with kan at the lacZ locus, and the 
fold TF defect was calculated relative to C6706. In a separate assay (Ta- 
ble 2), twelve pools of donor gDNA were generated from > 1,000 
Tn5(/tfl«) mutants of each isolate, and each pool was used to transform 
that same isolate and C6706. The fold TF defect was calculated for each 
isolate relative to C6706 that was also incubated with Tn5{kan) gDNA 
from that isolate. Transposon mutagenesis of C6706 and the Haiti isolates 
was performed as described elsewhere (43). 

Quorum-sensing assay. As described (44), a quorum-sensing re- 
porter plasmid (pBBl) was introduced into each isolate, and then tripli- 
cate cultures were grown overnight at 30°C. Luciferase levels and optical 
density at 600 nm (ODggg) were determined for each culture to calculate 
the relative light units (RLUs), expressed as (lux/ml)/(ODgoo units/ml). A 
quorum-sensing assay was also performed as previously described (44) 
(data not shown). 

SUPPLEMENTAL MATERIAL 

Supplemental material for this article may be found at http://mbio.asm.org 
/lookup/suppl/doi: 10.11 28/mBio.00398- 1 3/-/DCSupplemental. 

Figure SI, PDF file, 0.4 MB. 

Figure S2, PDF file, 0.7 MB. 
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